对比性语言图像预处理(剪辑)受到广泛关注,因为它的学会表示形式可以很好地转移到各种下游任务上。在剪辑训练期间,Infonce目标旨在使正面图像对齐和分开的负面图像对齐。在本文中,我们在此过程中显示了表示分组的效果:Infonce客观间接通过随机出现的模式内锚将语义相似的表示形式组合在一起。我们引入了原型对比度图像预处理(原始的),以提高其效率并提高其针对模态差距的鲁棒性来增强这种分组。具体而言,原始利润在图像和文本空间之间建立了原型级别的歧视,从而有效传输了更高级别的结构知识。我们进一步提出了典型的背部翻译(PBT),以将表示形式分组与表示形式对齐,从而有效地学习了在较大的模态差距下有意义的表示。 PBT还使我们能够以更丰富的先验知识介绍其他外部教师。 ProtoClip通过在线情节培训策略进行了培训,这可以扩展到无限量的数据。结合上述新颖的设计,我们在概念标题上训练原始设计,并获得了 +5.81%的成像网线性探测改进,并且 +2.01%的Imagenet Zero Zero-shot分类改进。代码可在https://github.com/megvii-research/protoclip上找到。
translated by 谷歌翻译
洪水灾害造成巨大的社会和经济损失。但是,传统的物理模型和基于学习的洪水预测模型都需要大量的历史洪水数据来训练模型参数。当来到一些没有足够历史数据的新站点时,由于过度拟合,模型性能会大大下降。该技术报告提出了一个洪水域适应网络(Flooddan),这是将无监督的域适应性(UDA)应用于洪水预测问题的基准。具体而言,洪水的培训包括两个阶段:在第一阶段,我们训练一个降雨编码器和一个预测头,以学习有关大规模源域数据的一般可转移的水文知识;在第二阶段,我们通过对抗结构域的比对将验证编码器中的知识转移到目标域的降雨编码器中。在推断期间,我们利用了在第二阶段接受训练的目标域降雨编码器,并在第一阶段进行了训练的预测头,以获得洪水预测的预测。 Tunxi和Changhua洪水数据集的实验结果表明,Flooddan可以通过零目标域监督有效地进行洪水预测。 Flooddan的性能与使用450-500小时的监督的监督模型相当。
translated by 谷歌翻译
长期以来,面部识别一直是人工智能领域的一个积极研究领域,尤其是自近年来深度学习的兴起以来。在某些实际情况下,每个身份只有一个可以培训的样本。在这种情况下的面部识别被称为单个样本识别,并对深层模型的有效培训构成了重大挑战。因此,近年来,研究人员试图释放更多的深度学习潜力,并在单个样本情况下提高模型识别性能。尽管已经对传统的单个样本面部识别方法进行了几项全面的调查,但这些评论很少涉及新兴的基于深度学习的方法。因此,我们将重点放在本文中的基于深度学习的方法上,将其分类为虚拟示例方法和通用学习方法。在前一种类别中,生成虚拟图像或虚拟特征以使深层模型的训练受益。在后者中,使用了其他多样本通用集。通用学习方法有三种类型:结合传统方法和深度特征,改善损失功能并改善网络结构,所有这些都涵盖了我们的分析。此外,我们回顾了通常用于评估单个样本面部识别模型的面部数据集,并继续比较不同类型的模型的结果。此外,我们讨论了现有的单个样本面部识别方法的问题,包括虚拟样本方法中的身份信息保存,通用学习方法中的域适应性。此外,我们认为开发无监督的方法是一个有希望的未来方向,并指出语义差距是需要进一步考虑的重要问题。
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
电子健康记录(EHR)可获得的丰富纵向个体水平数据可用于检查治疗效果异质性。但是,使用EHR数据估算治疗效果提出了几个挑战,包括时变的混杂,重复和时间不一致的协变量测量,治疗分配和结果以及由于辍学导致的损失。在这里,我们开发了纵向数据(SDLD)算法的亚组发现,该算法是一种基于树的算法,用于使用纵向相互作用树算法结合使用纵向相互作用的一般数据驱动的方法,与纵向驱动的方法与纵向驱动的方法结合使用纵向相互作用,以发现具有异质治疗效果的亚组,并进行纵向研究。目标最大似然估计。我们将算法应用于EHR数据,以发现患有人免疫缺陷病毒(HIV)的人群的亚组,他们在接受非Dolutegravir抗逆转录病毒疗法(ART)接受非Dolutegravir抗逆转录病毒疗法(艺术)时的体重增加风险较高。
translated by 谷歌翻译
近年来使用卷积神经网络对近年来的脸部检测进行了巨大进展。虽然许多面部探测器使用指定用于检测面的设计,但我们将面部检测视为通用对象检测任务。我们基于YOLOV5对象检测器实现了面部探测器,并调用它YOLO5FACE。我们对YOLOV5进行了一些关键修改,并优化了面部检测。这些修改包括在SPP中使用较小尺寸内核在骨干内使用杆块添加五点地标回归头,并在平移块中添加P6输出。我们从超大型模型设计不同型号大小的探测器,以实现对嵌入或移动设备的实时检测的超小型模型的最佳性能。实验结果在viderface数据集上显示,在VGA图像上,我们的脸部探测器可以在几乎所有简单,介质和硬的子集中实现最先进的性能,超过更复杂的指定面检测器。代码可用于\ url {https://github.com/deepcam-cn/yolov5-face}
translated by 谷歌翻译
电梯按钮识别是实现电梯的自主运行的关键功能。然而,具有挑战性的图像条件和各种图像扭曲使得难以准确地识别按钮。为了填补这一差距,我们提出了一种新的基于深度学习的方法,旨在基于按钮角检测结果自动校正电梯按钮图像的透视扭曲。首先,我们利用新的图像分割模型和Hough变换方法获取按钮分割和按钮角检测结果。然后,标准按钮角的像素坐标用作参考特征以估计用于校正透视失真的摄像机运动。十五个电梯按钮图像从与数据集的不同视角捕获。实验结果表明,我们所提出的方法能够估计相机运动并以高精度地消除电梯按钮图像的透视扭曲。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译